How to Unescape HTML in Java
π Unescaping HTML in Java: Because Code Deserves to Breatheβ
Ever looked at <java>public static void main(String[] args) { ... }</java>
and thought, "Wow, thatβs an ancient spell!"? Well, fear not, brave developer! Today, we shall unmask these cryptic symbols and return them to their former glory! πͺπ₯
1οΈβ£ The Magic of StringEscapeUtils.unescapeHtml4()
πͺβ
Apache Commons Text brings us a powerful tool to unescape HTML like a pro! But first, let's summon it with Maven:
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-text</artifactId>
<version>1.10.0</version>
</dependency>
π₯ The Spell: unescapeHtml4()
β
- Takes an escaped HTML string and restores it to its true form.
- If you pass
null
, it gracefully returnsnull
(because itβs polite!). - Recognizes all standard HTML 4.0 entities.
- If it encounters an unknown entity, it shrugs and leaves it alone. π
π Exampleβ
String escapedString = "<java>public static void main(String[] args) { ... }</java>";
String unEscapedHTML = StringEscapeUtils.unescapeHtml4(escapedString);
System.out.println(unEscapedHTML);
π Output:
<java>public static void main(String[] args) { ... }</java>
Ahhh, thatβs better! Now our Java code can breathe! π
2οΈβ£ Rolling Our Own Unescaper (a.k.a. Doing It the Hard Way π€)β
Want to impress your teammates by reinventing the wheel? Hereβs how you can build your own unescaper with plain Java!
π The DIY Approachβ
private static HashMap<String, String> htmlEntities;
static {
htmlEntities = new HashMap<>();
htmlEntities.put("<", "<");
htmlEntities.put(">", ">");
htmlEntities.put("&", "&");
htmlEntities.put(""", "\"");
htmlEntities.put(" ", " ");
htmlEntities.put("©", "\u00a9");
htmlEntities.put("®", "\u00ae");
htmlEntities.put("€", "\u20a0");
}
public static final String unescapeHTML(String source) {
int i, j;
boolean continueLoop;
int skip = 0;
do {
continueLoop = false;
i = source.indexOf("&", skip);
if (i > -1) {
j = source.indexOf(";", i);
if (j > i) {
String entityToLookFor = source.substring(i, j + 1);
String value = htmlEntities.get(entityToLookFor);
if (value != null) {
source = source.substring(0, i) + value + source.substring(j + 1);
continueLoop = true;
} else {
skip = i + 1;
continueLoop = true;
}
}
}
} while (continueLoop);
return source;
}
π Example Usageβ
String input = "<java>public static void main(String[] args) { ... }</java>";
String output = unescapeHTML(input);
System.out.println(output);
π Output:
<java>public static void main(String[] args) { ... }</java>
π― The Takeawayβ
- Use Apache Commons if you value your time. β³
- Build your own if you enjoy a bit of coding adventure. π
- Either way, you now have the power to unleash your HTML-escaped text into its readable form! π
Happy Coding! ππ¨